feat(MTP): per-test coverage analysis for MTP runner by piotr-nawrot-golba-music · Pull Request #1 · piotr-nawrot-golba-music/stryker-net

piotr-nawrot-golba-music · 2026-04-01T15:45:17Z

Summary

Implements per-test coverage capture for the Microsoft Testing Platform (MTP) test runner by running each test in an isolated server process.

Key changes:

SingleMicrosoftTestPlatformRunner.StopAndRemoveServerAsync() — stops server and removes from cache, triggering ProcessExit coverage flush
SingleMicrosoftTestPlatformRunner.RunSingleTestForCoverageAsync() — runs one test, stops server, reads per-test coverage file
MicrosoftTestPlatformRunnerPool.CaptureCoverageTestByTest() — iterates all tests using the runner pool for parallelism
CaptureCoverage() routing — uses per-test capture when CoverageBasedTest flag is set, aggregate otherwise
Confidence level: Normal for perTest, Exact for perTestInIsolation

Why process restart: MTP doesn't have an in-process data collector like VsTest's CoverageCollector. Since MutantControl only flushes coverage data on ProcessExit, the most reliable way to get per-test coverage is to stop and restart the server between tests. This is a one-time cost during the coverage capture phase.

Test plan

Unit tests pass: dotnet test src/Stryker.TestRunner.MicrosoftTestPlatform.UnitTest/ (150 pass)
Full solution builds: dotnet build src/Stryker.slnx (0 errors)
Full solution tests pass (1784+ tests, 0 failures)
Manual test with xUnit v3 project using --coverage-analysis perTest --test-runner mtp
Manual test with xUnit v3 project using --coverage-analysis perTestInIsolation --test-runner mtp
Manual test with --coverage-analysis all --test-runner mtp (aggregate fallback)

Add Testing Platform Support stryker-mutator/stryker-net#3094 (MTP support tracking)
Stryker.NET doesn't handle xUnit v3 properly stryker-mutator/stryker-net#3117 (xUnit v3 support)
chore: Rework MTP coverage analysis stryker-mutator/stryker-net#3460 (MTP coverage rework)
Upstream PR: feat(MTP): per-test coverage analysis for MTP runner stryker-mutator/stryker-net#3516

…ureCoverage Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ution MTP runner now captures per-test coverage by running each test in an isolated process. When coverage-analysis is set to perTest or perTestInIsolation, each test gets its own MTP server process. The server is stopped after each test, triggering MutantControl.FlushCoverageToFile() via ProcessExit, and the resulting coverage file is read to build per-test coverage results. This enables Stryker to determine which tests cover which mutants for MTP-based frameworks (xUnit v3, TUnit, MSTest with MTP, NUnit with MTP), unlocking coverage-based test optimization that was previously only available with VsTest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

piotr-nawrot-golba-music

PR Review: MTP Per-Test Coverage Capture

✅ Overall Verdict: Architecture is sound, will work for xUnit + MTP with file coverage

All 150 unit tests pass. The core flow (start server → run one test → stop server → ProcessExit triggers FlushCoverageToFile → read file) is correctly implemented and framework-agnostic — xUnit v3, TUnit, MSTest, NUnit via MTP all work the same way.

🟢 What works well

Correct routing: CaptureCoverage correctly routes perTest → CoverageBasedTest (Normal confidence) and perTestInIsolation → CaptureCoveragePerTest | CoverageBasedTest (Exact confidence). The all/off modes correctly fall through to aggregate path.
Process isolation is correct: StopAndRemoveServerAsync removes the server from _assemblyServers, so GetOrCreateServerAsync always starts a fresh process for the next test. Each process gets its own MutantControl static state — no cross-test contamination.
No race conditions on coverage files: Each runner has a unique _coverageFilePath (stryker-coverage-{id}.txt), and WaitServerProcessExitAsync blocks until the process exits (guaranteeing FlushCoverageToFile completes before the read).
File format consistency: Writer (MutantControl) and reader (ReadCoverageData) both use the "covered;static" format with comma-separated IDs. ✅
Integration with CoverageAnalyser: Returns ICoverageRunResult with the same structure as VsTest — the analyser processes them identically.

🟡 Issues to address

1. Silent coverage failure masks mutants as uncovered (Medium severity)

In RunSingleTestForCoverageAsync (SingleMicrosoftTestPlatformRunner.cs:134-172):

If StopAsync times out and force-kills the process, FlushCoverageToFile never runs. ReadCoverageData returns empty arrays, but the result still gets Normal/Exact confidence. The CoverageAnalyser then believes this test covers zero mutants — effectively hiding those mutants from testing.

Suggestion: After ReadCoverageData, if both lists are empty, consider logging a warning or downgrading to Dubious confidence. This way the analyser won't trust the empty result and will still run those mutants against this test.

2. Misleading test name: `RunSingleTestForCoverageAsync_ShouldReturnDubious_WhenNoCoverageFile` (Low severity)

This test (SingleMicrosoftTestPlatformRunnerCoverageTests.cs:554-570) tests ReadCoverageData() returning empty arrays — it does not test the CoverageConfidence.Dubious path. The Dubious confidence is only set in the catch block of RunSingleTestForCoverageAsync (line 160-170), which this test doesn't exercise. Similarly, RunSingleTestForCoverageAsync_ShouldReturnCoverageFromFile (line 518) only tests ReadCoverageData, not the actual RunSingleTestForCoverageAsync method.

Suggestion: Rename tests to match what they actually test (ReadCoverageData_...), or add a test that exercises the real RunSingleTestForCoverageAsync exception path via a TestableRunner with a throwing coverageHandler.

3. Missing test for the exception/Dubious path (Low severity)

There's no test that verifies RunSingleTestForCoverageAsync returns CoverageConfidence.Dubious when an exception occurs (e.g., server fails to start, test fails to run). Consider adding a test using TestableRunner with a coverageHandler that throws.

4. Performance consideration — no issue, but worth documenting

Starting/stopping one process per test means N process startups for N tests. For large test suites (1000+ tests), this will be noticeably slower than VsTest's in-process data collector approach. This is an acceptable trade-off since MTP doesn't support data collectors, but worth noting in documentation so users understand the impact of perTest/perTestInIsolation with MTP runners.

🟢 Summary

Aspect	Status
Per-test server isolation	✅ Correct
ProcessExit → FlushCoverageToFile	✅ Correct
Coverage file read/write	✅ Correct, format-consistent
Race conditions	✅ None found
xUnit + MTP compatibility	✅ Framework-agnostic
Integration with CoverageAnalyser	✅ Correct
Error handling	⚠️ Silent empty coverage needs attention
Test coverage of new code	⚠️ Good routing tests, but runner-level tests don't exercise real `RunSingleTestForCoverageAsync`

Bottom line: Yes, running mutations with xUnit and MTP with file coverage watching will work. The architecture is correct and the flow is sound. The issues above are about edge-case resilience and test accuracy, not correctness of the happy path.

When RunSingleTestForCoverageAsync gets empty coverage data (e.g., server force-killed before FlushCoverageToFile ran), the result now correctly uses CoverageConfidence.Dubious instead of the requested confidence level. This prevents silently marking mutants as uncovered when coverage capture failed. Also fixes misleading test names (renamed RunSingleTestForCoverageAsync_* to ReadCoverageData_* where they only tested ReadCoverageData), and adds proper tests for the Dubious confidence paths through the pool. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

piotr-nawrot-golba-music

Code Review: MTP Per-Test Coverage Capture

Verdict: No significant issues found. The implementation is correct and well-structured.

I thoroughly investigated the concurrency model, coverage file IPC coordination, error handling paths, and data parsing robustness. All 152 unit tests pass.

What I verified:

Concurrency is safe. \Parallel.ForEach\ with \MaxDegreeOfParallelism = _countOfRunners\ combined with \RunThisAsync's \ConcurrentBag.TryTake/Add\ pattern guarantees each runner instance is used by exactly one thread at a time. No race conditions on _assemblyServers\ (protected by _serverLock), and no contention on coverage files (unique path per runner via \stryker-coverage-{id}.txt).
Coverage file coordination is correct. Runner writes \STRYKER_COVERAGE_FILE\ env var (filename only), child test process reconstructs full path via \Path.GetTempPath(). Both sides resolve to the same file. Verified the flow: \DeleteCoverageFile → GetOrCreateServer → RunTest → StopAndRemoveServer → ReadCoverageData → DeleteCoverageFile.
Error handling is comprehensive. Force-killed process (30s timeout in \StopAsync) → empty coverage file → \Dubious\ confidence. Server startup failure → caught by \RunSingleTestForCoverageAsync\ catch-all → \Dubious. \ReadCoverageData\ handles missing file, empty content, malformed data (\int.TryParse\ + filter), and I/O exceptions.
Routing logic is correct. \CoverageBasedTest\ flag → per-test with \Normal\ confidence. \CoverageBasedTest | CaptureCoveragePerTest\ → per-test with \Exact\ confidence. Other modes → aggregate fallback.
\ParseMutantIds\ is robust against partial writes — uses \TryParse\ with \Where(HasValue)\ to silently skip non-numeric entries from a partially-flushed file.

piotr-nawrot-golba-music · 2026-04-01T16:40:27Z

Concurrency Improvement Suggestions

These aren't blocking issues — the current code is correct given the pool's single-threaded-per-runner invariant — but they'd make the concurrency model more robust and efficient.

1. \RunThisAsync\ — Replace \AutoResetEvent\ with \SemaphoreSlim\ (high value)

File: \MicrosoftTestPlatformRunnerPool.cs, lines 239–276

\RunThisAsync\ currently uses \AutoResetEvent.WaitOne(1000)\ in a polling loop to wait for an available runner. This blocks a ThreadPool thread for up to 1 second per iteration while waiting. Under high concurrency (many mutations queued via \Parallel.ForEach\ or concurrent \TestMultipleMutantsAsync\ calls), this can cause ThreadPool starvation.

\\csharp
// Current: blocks a thread, polls every 1s
while (!_availableRunners.TryTake(out runner))
{
if (!_runnerAvailableHandler.WaitOne(waitIntervalMs)) // ← blocks thread
{
attempts++;
// ...
}
}
\\

Suggestion: Replace \AutoResetEvent _runnerAvailableHandler\ with \SemaphoreSlim(_countOfRunners, _countOfRunners)\ and use \�wait WaitAsync()\ instead of the polling loop. This makes runner checkout fully async (freeing the thread back to the pool while waiting) and eliminates the 1-second polling granularity:

\\csharp
// Proposed: fully async, no thread blocked while waiting
private readonly SemaphoreSlim _runnerSemaphore = new(_countOfRunners, _countOfRunners);

private async Task RunThisAsync(Func<SingleMicrosoftTestPlatformRunner, Task> task)
{
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(300));
await _runnerSemaphore.WaitAsync(cts.Token).ConfigureAwait(false);
// ... TryTake + try/finally { Add + Release }
}
\\

2. \GetOrCreateServerAsync\ — Replace dual \lock\ with \SemaphoreSlim(1,1)\ (correctness improvement)

File: \SingleMicrosoftTestPlatformRunner.cs, \GetOrCreateServerAsync\ (lines ~314–340)

This method has a TOCTOU (time-of-check-to-time-of-use) pattern — it acquires \lock(_serverLock)\ to check the cache, releases it, does \�wait server.StartAsync(), then re-acquires the lock to store the server:

\\csharp
lock (_serverLock)
{
if (_assemblyServers.TryGetValue(assembly, out server) && server.IsInitialized)
return server; // ← check
}
// gap: lock released, await happens here
server = new AssemblyTestServer(...);
await server.StartAsync(); // ← create + start (outside lock)
lock (_serverLock)
{
_assemblyServers[assembly] = server; // ← store
}
\\

If two threads call this for the same assembly simultaneously, both miss the cache, both create and start a server, and the second _assemblyServers[assembly] = server\ silently overwrites the first — leaking a running server process that is never stopped or disposed.

This is currently safe because \RunThisAsync\ guarantees single-threaded access per runner instance. But the invariant is enforced at the pool level, not the class level — making it fragile if the runner is ever used differently.

Suggestion: Replace \object _serverLock\ with \SemaphoreSlim(1,1)\ and hold it across the entire check-create-start-store sequence:

\\csharp
private readonly SemaphoreSlim _serverLock = new(1, 1);

private async Task GetOrCreateServerAsync(string assembly)
{
await _serverLock.WaitAsync().ConfigureAwait(false);
try
{
if (_assemblyServers.TryGetValue(assembly, out var server) && server.IsInitialized)
return server;

    server = new AssemblyTestServer(...);
    await server.StartAsync();
    _assemblyServers[assembly] = server;
    return server;
}
finally
{
    _serverLock.Release();
}

}
\\

This eliminates the race structurally, regardless of how the runner is used.

3. Minor: \CalculateAssemblyTimeout\ — redundant lock acquisition per LINQ element

File: \SingleMicrosoftTestPlatformRunner.cs, \CalculateAssemblyTimeout\

This method acquires _discoveryLock\ once per element inside a .Sum()\ LINQ lambda. Since _testsByAssembly\ is a plain \Dictionary, each iteration re-acquires the lock to read from it. It would be cleaner (and marginally faster for large test suites) to snapshot the dictionary once under a single lock acquisition, then iterate the snapshot outside the lock.

… for async-safe concurrency - Replace AutoResetEvent with SemaphoreSlim in RunnerPool.RunThisAsync to avoid blocking thread-pool threads during runner checkout - Replace object _serverLock with SemaphoreSlim(1,1) in SingleRunner to enable holding the lock across await in GetOrCreateServerAsync, eliminating the TOCTOU race in the check-create-start-store pattern - Fix CalculateAssemblyTimeout to snapshot _testDescriptions once instead of acquiring _discoveryLock per LINQ element Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…RunThisAsync Replace recursive self-call with a while loop sharing a single CancellationTokenSource so the 300-second timeout acts as a hard upper bound across all retries, preventing potential infinite loops if the semaphore/bag invariant is ever broken. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

piotr-nawrot-golba-music · 2026-04-01T19:08:37Z

Review Round 3 — Final Review + Allocation Analysis

Previous Comments Status ✅

All 3 suggestions from the first review have been addressed:

~~RunThisAsync — AutoResetEvent → SemaphoreSlim~~ ✅ Applied
~~GetOrCreateServerAsync — dual lock → single SemaphoreSlim(1,1) hold~~ ✅ Applied
~~CalculateAssemblyTimeout — lock-per-LINQ-element antipattern~~ ✅ Applied

The recursive retry bug found in review round 2 was also fixed → bounded while loop ✅

🐛 Bug: Stale server in error path leaks coverage data between tests

File: SingleMicrosoftTestPlatformRunner.cs, lines 184–195 (RunSingleTestForCoverageAsync catch block)

If server.RunTestsAsync throws (line 151), the catch block returns Dubious but never stops the server. The server stays alive in _assemblyServers with accumulated MutantControl._coveredMutants data. When the next test reuses this server for the same assembly, ProcessExit → FlushCoverageToFile writes combined coverage from both tests — polluting the second test's result.

Fix: Add best-effort cleanup in the catch block:

catch (Exception ex)
{
    _logger.LogWarning(ex, "{RunnerId}: Failed to capture coverage for test {TestId}", RunnerId, testId);
    try { await StopAndRemoveServerAsync(assembly).ConfigureAwait(false); }
    catch { /* best-effort cleanup */ }
    return CoverageRunResult.Create(testId, CoverageConfidence.Dubious, ...);
}

⚠️ Race: `_serverLock.Dispose()` after `Release()` can throw `ObjectDisposedException`

File: SingleMicrosoftTestPlatformRunner.cs, lines 676–677 (Dispose(bool))

_serverLock.Release();  // line 676 — unblocks a waiter
_serverLock.Dispose();  // line 677 — disposes immediately

Between Release() and Dispose(), a concurrent caller unblocked by Release() enters its critical section, then throws ObjectDisposedException when it calls _serverLock.Release() in its finally block. The _disposed field is a plain bool with no memory barrier and nothing checks it before acquiring the semaphore.

Fix: Don't dispose the SemaphoreSlim — it has no unmanaged resources and GC handles it. Or set a volatile bool _disposing flag and check it in all acquisition sites.

📊 Allocation Hot Spots (per-mutation paths)

These execute thousands of times during a mutation run and create measurable GC pressure:

#	Location	Issue	Severity	Fix
1	`RunnerId` property (line 36)	`$"MtpRunner-{_id}"` allocates a new string on every access (~28 call sites, ~6+ per mutation)	🔴 High	Cache as `readonly string _runnerId` in constructor
2	`RunTestsInternalAsync` (lines 599–616)	2x `.ToList()` + 3 LINQ iterators + re-filter for failures + per-test string interpolation. Re-iterates `finishedTests` 3 additional times	🔴 High	Single `foreach` loop building all lists in one pass
3	`_testDescriptions.Values.ToList()` (line 630)	Full collection copy per-mutation per-assembly	🔴 High	Pass `.Values` directly (already a snapshot of refs), or cache and invalidate on discovery
4	`CalculateAssemblyTimeout` (line 396)	`new Dictionary<>(_testDescriptions)` copies entire dictionary per-mutation	🔴 High	Hold lock for the brief `.Sum()` instead of copying
5	`CalculateAssemblyTimeout` (lines 400–401)	`ContainsKey` + `TryGetValue` = double hash lookup per test node	🟡 Low	Just `TryGetValue` in the `Sum`, drop the `Where`
6	Debug log (line 87)	`string.Join(",", mutants.Select(...))` evaluates even when Debug is off	🟠 Medium	Guard with `if (_logger.IsEnabled(LogLevel.Debug))`
7	`ParseMutantIds` (lines 303–308)	3 LINQ iterators + `int?` nullable boxing + `.Trim()` per token	🟠 Medium	`foreach` with `StringSplitOptions.TrimEntries`
8	`RegisterInitialTestResult` (line 607)	`new MtpTestResult(duration)` per-test per-mutation — overwrites "initial" on every mutation	🟠 Medium	Only register during initial test run, not mutation runs
9	`RunTestsInternalAsync` (lines 604–606)	`ContainsKey` + indexer `[]` = double dictionary lookup per test	🟡 Low	Use `TryGetValue`
10	`assemblies.Any()` (lines 87, 232)	Allocates enumerator on `IReadOnlyList`	🟡 Low	Use `.Count == 0`

Top 3 highest-impact fixes:

Cache RunnerId — trivial one-liner, eliminates ~30k+ string allocs per 5k-mutation run
Single-pass foreach in RunTestsInternalAsync — eliminates ~6 intermediate allocations on the hottest per-mutation path
Stop copying _testDescriptions — eliminates a large collection copy on every mutation

Review covers: concurrency correctness, resource leaks, allocation efficiency. All 152 unit tests pass.